Impact of Dietary Shifts on Gut Microbiome Dynamics

Multivariate Insights Using R

R for Bio Data Analysis

Group 16: Eric Torres, Lucia de Lamadrid, Konstantina Gkopi, Elena Iriondo and Jorge Santiago

2024-12-03

Introduction

Our aim:

To study the relationship between the composition of the gut microbiota and factors such as diet and colonisation history.

Materials and Methods

General Workflow

MICROBIOME METADATA:

# A tibble: 6 × 6,701
   Diet Source Donor CollectionMet   Sex     OTU0     OTU1     OTU2     OTU3
  <dbl>  <dbl> <dbl>         <dbl> <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1     0      0     0             0     0 1.56e-11 4.72e-11 1.23e-11 4.52e-11
2     0      1     0             0     0 2.36e-11 9.53e-11 3.33e-11 2.67e-11
3     0      2     0             1     0 6.77e-11 3.68e-11 8.02e-11 5.49e-11
4     0      2     0             0     0 5.52e-11 9.89e-11 4.58e-11 3.54e-11
5     0      3     0             0     0 5.24e-11 6.34e-11 2.35e-11 7.47e-11
6     0      4     0             1     0 7.67e-11 7.22e-11 5.41e-11 1.20e-11
# ℹ 6,692 more variables: OTU4 <dbl>, OTU5 <dbl>, OTU6 <dbl>, OTU7 <dbl>,
#   OTU8 <dbl>, OTU9 <dbl>, OTU10 <dbl>, OTU11 <dbl>, OTU12 <dbl>, OTU13 <dbl>,
#   OTU14 <dbl>, OTU15 <dbl>, OTU16 <dbl>, OTU17 <dbl>, OTU18 <dbl>,
#   OTU19 <dbl>, OTU20 <dbl>, OTU21 <dbl>, OTU22 <dbl>, OTU23 <dbl>,
#   OTU24 <dbl>, OTU25 <dbl>, OTU26 <dbl>, OTU27 <dbl>, OTU28 <dbl>,
#   OTU29 <dbl>, OTU30 <dbl>, OTU31 <dbl>, OTU32 <dbl>, OTU33 <dbl>,
#   OTU34 <dbl>, OTU35 <dbl>, OTU36 <dbl>, OTU37 <dbl>, OTU38 <dbl>, …

OTU TAXONOMY GLOSSARY:

  OTU.ID  Kingdom        Phylum         Class           Order
1   OTU0 Bacteria                                            
2   OTU1 Bacteria    Firmicutes    Clostridia   Clostridiales
3   OTU2 Bacteria    Firmicutes       Bacilli Lactobacillales
4   OTU3 Bacteria Bacteroidetes Bacteroidetes   Bacteroidales
5   OTU4 Bacteria Bacteroidetes                              
6   OTU5 Bacteria    Firmicutes    Clostridia   Clostridiales
              Family           Genus X X.1
1                                         
2    Ruminococcaceae                      
3    Enterococcaceae    Enterococcus      
4 Porphyromonadaceae Parabacteroides      
5                                         
6                                         

Data Tidying and Filtering

  • Added a SampleID column to uniquely identify each sample.

  • Transformed the dataset from wide to long format for easier analysis.

  • Keeping OTUs contributing up to 95% of cumulative abundance.

  • Replaced the numeric codes with descriptive labels.

# Creation and relocation of SampleID
metadata_df <- metadata_df |>
  mutate(SampleID = row_number()) |>  # Create SampleID from the first column
  relocate(SampleID, 
           .before = everything())  # Move SampleID to the first position

metadata_df_long <- metadata_df |> 
  pivot_longer(
    cols = starts_with("OTU"), 
    names_to = "OTU", 
    values_to = "rel_abundance"
  )

head(metadata_df_long)

# Calculate cumulative contribution
cumulative_otus <- metadata_df_long |>
  group_by(OTU) |>
  summarize(mean_abundance = mean(rel_abundance)) |>
  arrange(desc(mean_abundance)) |>
  mutate(cumulative_abundance = cumsum(mean_abundance) / sum(mean_abundance))

# Filter OTUs contributing to 95% cumulative abundance
otus_to_keep <- cumulative_otus |>
  filter(cumulative_abundance <= 0.95) |>
  pull(OTU)

# Number of OTUs before filtering
n_total_otus <- metadata_df_long |> 
  pull(OTU) |> 
  n_distinct()

# Number of OTUs after filtering
n_filtered_otus <- filtered_metadata |> 
  pull(OTU) |> 
  n_distinct()

filtered_metadata_stricter_label <- filtered_metadata_stricter |> 
  mutate(Diet = case_when(Diet == 0 ~ "LFPP",
                          Diet == 1 ~ "Western",
                          Diet == 2 ~ "CARBR",
                          Diet == 3 ~ "FATR",
                          Diet == 4 ~ "Suckling",
                          Diet == 5 ~ "Human")) |> 
  mutate(Source = case_when(Source == 0 ~ "Cecum1",
                          Source == 1 ~ "Cecum2", 
                          Source == 2 ~ "Colon1", 
                          Source == 3 ~ "Colon2", 
                          Source == 4 ~ "Feces",
                          Source == 5 ~ "SI1",
                          Source == 6 ~ "SI13", 
                          Source == 7 ~ "SI15", 
                          Source == 8 ~ "SI2", 
                          Source == 9 ~ "SI5",
                          Source == 10 ~ "SI9", 
                          Source == 11 ~ "Stomach", 
                          Source == 12 ~ "Cecum")) |> 
  mutate(Donor = case_when(Donor == 0 ~ "HMouseLFPP",
                          Donor == 1 ~ "CONVR", 
                          Donor == 2 ~ "Human", 
                          Donor == 3 ~ "Fresh", 
                          Donor == 4 ~ "Frozen",
                          Donor == 5 ~ "HMouseWestern", 
                          Donor == 6 ~ "CONVD")) |> 
  mutate(CollectionMet = case_when(CollectionMet == 0 ~ "Contents",
                                   CollectionMet == 1 ~ "Scraping")) |> 
  mutate(Sex = case_when(Sex == 0 ~ "Male",
                         Sex == 1 ~ "Female")) 
head(filtered_metadata_stricter_label)

Now, our data is tidy!

# A tibble: 6 × 8
  SampleID Diet  Source Donor      CollectionMet Sex   OTU   rel_abundance
     <dbl> <chr> <chr>  <chr>      <chr>         <chr> <chr>         <dbl>
1        1 LFPP  Cecum1 HMouseLFPP Contents      Male  OTU6       3.31e-11
2        1 LFPP  Cecum1 HMouseLFPP Contents      Male  OTU7       5.08e-11
3        1 LFPP  Cecum1 HMouseLFPP Contents      Male  OTU9       2.57e- 3
4        1 LFPP  Cecum1 HMouseLFPP Contents      Male  OTU41      7.95e-11
5        1 LFPP  Cecum1 HMouseLFPP Contents      Male  OTU58      2.53e-11
6        1 LFPP  Cecum1 HMouseLFPP Contents      Male  OTU77      1.28e- 3

and ready to be augmented…

We will use the OTUs taxonomy file to add columns with the names of phylum and class for each OTU, using left_join.

clean_df_taxonomy <- clean_df |>  
  left_join(otu_df_modified, 
            join_by(OTU == OTU.ID)) |> 
  relocate(Phylum, Class, .after = OTU) 

head(clean_df_taxonomy)
# A tibble: 6 × 10
  SampleID Diet  Source Donor      CollectionMet Sex   OTU   Phylum     Class   
     <dbl> <chr> <chr>  <chr>      <chr>         <chr> <chr> <chr>      <chr>   
1        1 LFPP  Cecum1 HMouseLFPP Contents      Male  OTU6  Firmicutes Bacilli 
2        1 LFPP  Cecum1 HMouseLFPP Contents      Male  OTU7  Firmicutes Clostri…
3        1 LFPP  Cecum1 HMouseLFPP Contents      Male  OTU9  Firmicutes Clostri…
4        1 LFPP  Cecum1 HMouseLFPP Contents      Male  OTU41 Firmicutes Bacilli 
5        1 LFPP  Cecum1 HMouseLFPP Contents      Male  OTU58 Firmicutes Clostri…
6        1 LFPP  Cecum1 HMouseLFPP Contents      Male  OTU77 Firmicutes Clostri…
# ℹ 1 more variable: rel_abundance <dbl>

Results and Discussion

Microbiota composition in terms of phyla in different:

  • sources and diet types

  • diet and donor combination

05

Principal Component Analysis on Phylum-Level Aggregated Microbiome Data

# A tibble: 6 × 9
  SampleID  Diet Actinobacteria Bacteroidetes Firmicutes Proteobacteria      TM7
     <dbl> <dbl>          <dbl>         <dbl>      <dbl>          <dbl>    <dbl>
1      339     1        0.00316        0.463       0.470        0.00671 7.70e-11
2      340     1        0.0180         0.106       0.811        0.00138 3.72e-11
3      341     1        0.0121         0.0443      0.873        0.00792 5.68e-11
4      342     1        0.00531        0.145       0.786        0.0135  7.89e-12
5      343     1        0.00801        0.607       0.300        0.0300  3.90e-11
6      344     1        0.631          0.199       0.141        0.00467 3.98e-11
# ℹ 2 more variables: Unclassified <dbl>, Verrucomicrobia <dbl>

Principal Component Analysis on Phylum-Level Aggregated Microbiome Data

Analysis of Microbiome Clusters by Donor Groups Using Hierarchical Clustering

  • We compute a distance matrix using the Euclidean distance to measure similarity between microbiome samples. The clustering is performed using Ward’s method, which minimizes variance within clusters.

    # Compute Euclidean distance matrix
    dist_matrix <- otu_data_scaled |>
      dist()
    
    # Perform hierarchical clustering
    hclust_result <- hclust(dist_matrix, method = "ward.D2")
    
    # Cut dendrogram into 3 clusters
    cluster_labels <- cutree(hclust_result, k = 3) |>
      as_tibble() |>
      rename(Cluster = value)
  • Chi-Squared Test

    # Perform chi-squared test
    chi2_result <- chisq.test(donor_cluster_table)
    chi2_result
    
        Pearson's Chi-squared test
    
    data:  donor_cluster_table
    X-squared = 659.91, df = 8, p-value < 2.2e-16
  • Cluster 1 is dominated by HMouseLFPP (55.5%) with notable contributions from Frozen (17.8%) and Fresh (18.7%), reflecting plant-rich diets and preserved samples.
  • Cluster 2 includes mostly Fresh (55.1%) and HMouseLFPP (26.5%), indicating a mix of human-derived and dietary influences.
  • Cluster 3 is almost entirely CONVR (95%), representing natural microbiota from control mice.
  • The chi-squared test confirms significant associations between donor origins and clusters, highlighting the influence of donors on microbiota composition.

Biodiversity and diet

Shannon diversity index

  • Number of species living in a habitat (richness)
  • Relative abundance (evenness).

\[ H' = -\sum_{i=1}^R p_i \ln p_i \\ p_i \text{ is the relative abundance of OTU}_i\\ R \text{ is the total number of OTUs} \]

Biodiversity in the microbiota of first-generation humanized mice was found to differ significantly across different diets.

Conclusion

  • The “Obesity-inducing” diet influences the Firmicutes-Bacteroidetes ratio
  • PCA shows how diet shapes microbial composition, as well as the relationship between different phyla.

  • Clustering shines light on how the microbiota donor structures the data

  • The Western diet favours a more biodiverse gut ecosystem